1
The CUDA Execution Model: Host vs. Device
AI032 Lesson 3
00:00

The CUDA execution model transforms your computer into a high-performance heterogeneous system. Imagine a Grand Director (the Host/CPU) and an Army of Thousands (the Device/GPU). The Director handles complex logic and decision-making, while the Army performs massive, repetitive tasks simultaneously.

1. The Architectural Divide

The Host is a latency-optimized CPU designed for complex control flow and serial tasks. Conversely, the Device is a throughput-optimized GPU containing thousands of simple cores designed to execute the same instruction across vast datasets simultaneously.

2. The Execution Rhythm

A CUDA program functions as a series of phases. Execution begins on the Host for "serial code." When the program hits a "Parallel Kernel," it launches a Grid of threads onto the Device. Control returns to the Host once the Device finishes its massive workload.

HOST (CPU)DEVICE (GPU)Serial CodeParallel Kernel(Grid of Threads)Serial Code

3. Performance Specialization

The model leverages the strengths of both: the CPU manages system resources and complex branches, while the GPU executes SPMD (Single-Program, Multiple-Data) logic to process data elements in parallel.

main.py
TERMINAL bash — 80x24
> Ready. Click "Run" to execute.
>